library(igraph)
library(here)

1 load edgelist

# Step 1: Read the edge list from the .txt file
file_path <- here("lecture-2/collaboration.edgelist.txt")
edge_list <- read.table(file_path, sep = "\t", header = FALSE)
#convert to an igraph network
matrix1 <- as.matrix(edge_list) #igraph wants our data in matrix format
head(matrix)
##                                                                            
## 1 function (data = NA, nrow = 1, ncol = 1, byrow = FALSE, dimnames = NULL) 
## 2 {                                                                        
## 3     if (is.object(data) || !is.atomic(data))                             
## 4         data <- as.vector(data)                                          
## 5     .Internal(matrix(data, nrow, ncol, byrow, dimnames, missing(nrow),   
## 6         missing(ncol)))
class(matrix1)
## [1] "matrix" "array"

2 Attempt at ploting a graph

# Step 2: Adjust vertex IDs to start from 1
adjusted_edge_list <- as.matrix(edge_list + 1)

net1 <- graph.edgelist(adjusted_edge_list, directed=FALSE)

library(RBioFabric)
bioFabric(net1)
## Warning in graph.bfs(bfGraphPass1, startIndex, neimode = "all"): Argument
## `neimode' is deprecated; use `mode' instead

3 Graph Analysis

3.1 Size & Density

### Number of Nodes
nodes <- igraph::vcount(net1)

### Number of Edges
edges <- igraph::gsize(net1)

## Density
density <- igraph::graph.density(net1) 

# the density is a value between 0 and 1, where 0 represents a completely sparse graph (no edges) and 1 represents a complete graph (all possible edges present).

print(paste("Number of Nodes:", nodes))
## [1] "Number of Nodes: 23133"
print(paste("Number of Edges:", edges))
## [1] "Number of Edges: 93436"
print(paste("Density:", density))
## [1] "Density: 0.000349219987280582"

3.2 Degree Centrality Plot

# Calculate degree centrality for each vertex
degree_centrality <- degree(net1)

# Create a histogram of degree centrality
hist(degree_centrality, main = "Degree Centrality Distribution",
     xlab = "Degree Centrality", ylab = "Frequency", col="darkblue")

We can see that most nodes are lowly linked

3.3 Diameter

  • It is the shortest distance between the two most distant nodes in the network. In other words, once the shortest path length from every node to all other nodes is calculated, the diameter is the longest of all the calculated path lengths
# Calculate the diameter of the graph
diameter <- diameter(net1)
print(paste("Diameter of the graph:", diameter))
## [1] "Diameter of the graph: 15"

3.4 Connectedness

igraph::is.connected(net1)
## [1] FALSE
  • From our above analysis, we can observe that the network is disconnected. Let us split them into their components.

  • Let us analyse the largest component

4 Largest Components Analysis

In the most of research topics of network analysis, network features are related to the largest connected component of a graph(Newman 2010). In order to get that for an igraph or a network object, giant_component_extract function is specified. For using this function we can do:

4.1 load CINNA for further analyses

library(CINNA)

4.2 Identify the Largest Component:

largest_component_graph <- CINNA::giant_component_extract(net1)[[1]]

4.3 Largest Component properties

### Number of Nodes
largest_nodes <- igraph::vcount(largest_component_graph)

### Number of Edges
largest_edges <- igraph::gsize(largest_component_graph)

## Density
largest_density <- igraph::graph.density(largest_component_graph) 

print(paste("Subgraph is connected:", is.connected(largest_component_graph)))
## [1] "Subgraph is connected: TRUE"
print(paste("Number of Nodes:", largest_nodes))
## [1] "Number of Nodes: 21362"
print(paste("Number of Edges:", largest_edges))
## [1] "Number of Edges: 91283"
print(paste("Density:", largest_density))
## [1] "Density: 0.000400088814343288"

4.4 Suitable centrality measures

All of the introduced centrality measures are not appropriate for all types of networks. So, to figure out which of them is suitable, proper_centralities is specified. This function distinguishes proper centrality types based on network topology. To use this, we can do:

proper_centrality <- head(CINNA::proper_centralities(largest_component_graph), 1)
##  [1] "subgraph centrality scores"                      
##  [2] "Topological Coefficient"                         
##  [3] "Average Distance"                                
##  [4] "Barycenter Centrality"                           
##  [5] "BottleNeck Centrality"                           
##  [6] "Centroid value"                                  
##  [7] "Closeness Centrality (Freeman)"                  
##  [8] "ClusterRank"                                     
##  [9] "Decay Centrality"                                
## [10] "Degree Centrality"                               
## [11] "Diffusion Degree"                                
## [12] "DMNC - Density of Maximum Neighborhood Component"
## [13] "Eccentricity Centrality"                         
## [14] "Harary Centrality"                               
## [15] "eigenvector centralities"                        
## [16] "K-core Decomposition"                            
## [17] "Geodesic K-Path Centrality"                      
## [18] "Katz Centrality (Katz Status Index)"             
## [19] "Kleinberg's authority centrality scores"         
## [20] "Kleinberg's hub centrality scores"               
## [21] "clustering coefficient"                          
## [22] "Lin Centrality"                                  
## [23] "Lobby Index (Centrality)"                        
## [24] "Markov Centrality"                               
## [25] "Radiality Centrality"                            
## [26] "Shortest-Paths Betweenness Centrality"           
## [27] "Current-Flow Closeness Centrality"               
## [28] "Closeness centrality (Latora)"                   
## [29] "Communicability Betweenness Centrality"          
## [30] "Community Centrality"                            
## [31] "Cross-Clique Connectivity"                       
## [32] "Entropy Centrality"                              
## [33] "EPC - Edge Percolated Component"                 
## [34] "Laplacian Centrality"                            
## [35] "Leverage Centrality"                             
## [36] "MNC - Maximum Neighborhood Component"            
## [37] "Hubbell Index"                                   
## [38] "Semi Local Centrality"                           
## [39] "Closeness Vitality"                              
## [40] "Residual Closeness Centrality"                   
## [41] "Stress Centrality"                               
## [42] "Load Centrality"                                 
## [43] "Flow Betweenness Centrality"                     
## [44] "Information Centrality"                          
## [45] "Dangalchev Closeness Centrality"                 
## [46] "Group Centrality"                                
## [47] "Harmonic Centrality"                             
## [48] "Local Bridging Centrality"                       
## [49] "Wiener Index Centrality"
head(proper_centrality, 10)
## [1] "subgraph centrality scores"
  • notice how closeness and degree centrality is brought up. Thus let us explore these measures.

4.5 Degree centrality

head(
  sort(
    calculate_centralities(
      largest_component_graph, 
      include = "Degree Centrality"
      )[[1]], #unlist because this function returns a list object of length 1
    decreasing = TRUE
    ), 
  20
  )
##  [1] 279 252 201 190 182 165 158 148 142 138 131 130 128 125 124 124 122 120 119
## [20] 111
  • we can note how there are nodes that are highly connected to other nodes within the network. Thus, this implies high degree centrality within the network.

4.6 Closeness centrality

head(
  sort(
    calculate_centralities(
      largest_component_graph, 
      include = "Closeness Centrality (Freeman)"
      )[[1]], #unlist because this function returns a list object of length 1
          decreasing = TRUE
    ), 
  20
  )
##  [1] 1.397507e-05 1.335613e-05 1.319435e-05 1.300339e-05 1.291506e-05
##  [6] 1.288992e-05 1.287813e-05 1.277792e-05 1.270035e-05 1.261209e-05
## [11] 1.259525e-05 1.256281e-05 1.253698e-05 1.253243e-05 1.250625e-05
## [16] 1.250234e-05 1.249282e-05 1.247069e-05 1.246587e-05 1.245733e-05
  • we can note how the the highest value of closeness is in fact significantly low. Since closeness is defined as the inverse of the sum of distances to all the other vertices in the graph, this tells us that the network has low closeness.

Thus, a conclusion we can draw for this largest component

  • Low Closeness
  • High Degree

The local clustering coefficient focuses on individual nodes and their immediate neighborhoods. It quantifies how close a node’s neighbors are to forming a complete subgraph. Probability that neighbors of a vertex are also connected.

4.7 Clustering Coefficient/ Transitivity

transitivity <- igraph::transitivity(largest_component_graph)
transitivity
## [1] 0.2618218
  • This tells us that we are dealing with a distant network, since a very small portion of the neighbours within a network is connected to one another.

5 Visualization of centrality analysis

After evaluating centrality measures, demonstrating high values of centralities in some nodes gives an overall insight about the network to the researcher. By using visualize_graph function, you will be able to illustrate the input graph based on the specified centrality value. If the centrality measure values were computed, computed.centrality.value argument is recommended. Otherwise, using centrality.type argument, the function will compute centrality based on the input name of centrality type. For practice, we specifie Degree Centrality. Here,

visualize_graph( largest_component_graph , centrality.type="Degree Centrality")

# Plot the subgraph of the neighborhood
plot(largest_component_graph)